contextual stochastic bandit problem
Model Selection in Contextual Stochastic Bandit Problems
We study bandit model selection in stochastic environments. Our approach relies on a master algorithm that selects between candidate base algorithms. We develop a master-base algorithm abstraction that can work with general classes of base algorithms and different type of adversarial master algorithms. Our methods rely on a novel and generic smoothing transformation for bandit algorithms that permits us to obtain optimal $O(\sqrt{T})$ model selection guarantees for stochastic contextual bandit problems as long as the optimal base algorithm satisfies a high probability regret guarantee. We show through a lower bound that even when one of the base algorithms has $O(\log T)$ regret, in general it is impossible to get better than $\Omega(\sqrt{T})$ regret in model selection, even asymptotically.
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.63)
Supplement to " Model Selection in Contextual Stochastic Bandit Problems "
In Section D we present the proofs for Section 5.1 In Section H we show the proofs of the lower bounds in Section 6. We outline briefly some other direct applications of our results. CORRAL will achieve regret O ( p | L | dT) . B.1 Original Corral The original Corral algorithm [2] is reproduced below. We reproduce the EXP3.P algorithm (Figure 3.1 in [ 's expected replay regret satisfies: Therefore total regret is bounded by 6 U ( T,) log( T) D.2 Applications of Proposition 5.1 We now show that several algorithms are ( U,, T) bounded: Lemma D.2.
Review for NeurIPS paper: Model Selection in Contextual Stochastic Bandit Problems
Weaknesses: I believe the paper could present more context around the results in sections 4.2 through 4.4. What other results exist in these settings? Similarly, the numerical experiments are not discussed at all and are hard to interpret. I find it hard to draw conclusions from these experiments. I would recommend the description of the algorithm be moved ahead of section 4. Section 4 is hard to judge on the first read as the algorithm generating the claimed results is not yet presented.
Model Selection in Contextual Stochastic Bandit Problems
We study bandit model selection in stochastic environments. Our approach relies on a master algorithm that selects between candidate base algorithms. We develop a master-base algorithm abstraction that can work with general classes of base algorithms and different type of adversarial master algorithms. Our methods rely on a novel and generic smoothing transformation for bandit algorithms that permits us to obtain optimal O(\sqrt{T}) model selection guarantees for stochastic contextual bandit problems as long as the optimal base algorithm satisfies a high probability regret guarantee. We show through a lower bound that even when one of the base algorithms has O(\log T) regret, in general it is impossible to get better than \Omega(\sqrt{T}) regret in model selection, even asymptotically.